Toxic Comment Detection

1 Introduction and Summary

This project aims to create a comment classifier capable of assigning specific toxicity labels to text comments, such as insults, identity hate, etc.

Throughout the project, various experiments were conducted using pre-trained NLP models, including RoBERTa and DistilBERT. The outcomes produced a 0.693 AUC-PR and a 0.67 macro average F1-Score.

Imports:

Show the code
%reload_ext autoreload
%autoreload 1
import matplotlib.pyplot as plt
from sklearn.metrics import (
    f1_score,
    precision_score,
    recall_score,
    classification_report
)
from skmultilearn.model_selection.iterative_stratification import (
    iterative_train_test_split,
)
from model.model import CommentClassifier, TunerDynamicUnfreeze
from model.data_loader import CommentDataModule
import auxiliary_functions.auxiliary_functions as aux
from transformers import AutoTokenizer
import pytorch_lightning as ptl
import torch
import joblib
import zipfile
import os
import polars as pl
import seaborn as sns
import matplotlib.pyplot as  plt
from matplotlib import ticker
from tabulate import tabulate
from IPython.display import Markdown, display
from langdetect import detect_langs
import numpy as np
from IPython.display import clear_output
from lime.lime_text import LimeTextExplainer
%aimport model.model
%aimport model.data_loader
%aimport auxiliary_functions.auxiliary_functions

Options:

Show the code
BASE_FIG_SIZE = (8.5, 4.5)
np.random.seed(1)

1.1 Data Loading

Downloading the data set:

!kaggle competitions download -c jigsaw-toxic-comment-classification-challenge -p data

Unzip and store files:

zip_file_path = data/jigsaw-toxic-comment-classification-challenge.zip with zipfile.ZipFile(zip_file_path, r) as zip_ref: zip_ref.extractall(data) os.remove(zip_file_path) zip_files=os.listdir(data) for zip in zip_files: zip_path=f”data/{zip}” with zipfile.ZipFile(zip_path, r) as zip_ref: zip_ref.extractall(data) os.remove(zip_path)

Read the data:

Show the code
train_data = pl.read_csv("data/train.csv")

2 Exploratory Data Analysis

Number of samples in the training data:

Show the code
len(train_data)
159571

Overview:

Show the code
train_data.head()
shape: (5, 8)
id comment_text toxic severe_toxic obscene threat insult identity_hate
str str i64 i64 i64 i64 i64 i64
"0000997932d777… "Explanation Wh… 0 0 0 0 0 0
"000103f0d9cfb6… "D'aww! He matc… 0 0 0 0 0 0
"000113f07ec002… "Hey man, I'm r… 0 0 0 0 0 0
"0001b41b1c6bb3… "" More I can't… 0 0 0 0 0 0
"0001d958c54c6e… "You, sir, are … 0 0 0 0 0 0
Show the code
toxicity_labels = train_data.columns[-6:]

Are there duplicate ID’s?

Show the code
train_data["id"].is_duplicated().any()
False

Are there duplicate comments?

Show the code
train_data["comment_text"].is_duplicated().any()
False

2.1 Class Balance

Percentage of comments with each label:

Show the code
fig_classes, ax_classes = plt.subplots(figsize=BASE_FIG_SIZE)
class_proportion = train_data[:, 2:].sum().to_numpy()[0] / len(train_data)
sns.barplot(x=class_proportion * 100, y=train_data.columns[2:], ax=ax_classes)
ax_classes.xaxis.set_major_formatter(ticker.PercentFormatter())
ax_classes.set_xlabel("Percentage of Comments with Label")
Text(0.5, 0, 'Percentage of Comments with Label')

The toxicity labels are rare with the lowest minority labels only being present in less than 1% of the comments.

Can comments have more than one label?

Show the code
label_sums = train_data[:, 2:].sum_horizontal().value_counts()
label_sums.columns = ["Total labels", "Number of Comments"]
label_sums.sort("Number of Comments")
shape: (7, 2)
Total labels Number of Comments
i64 u32
6 31
5 385
4 1760
2 3480
3 4209
1 6360
0 143339

Comments can have more than one label.

Fraction of comments with no labels:

Show the code
round(
    (
        label_sums.filter(pl.col("Total labels") == 0)["Number of Comments"]
        / len(train_data)
    ).item(),
    2,
)
0.9

90% of the comments are benign.

2.2 Comment Length

Comment Length Distribution:

Show the code
fig_comm_len, ax_com_len = plt.subplots(figsize=BASE_FIG_SIZE)
comment_lengths = train_data["comment_text"].str.len_chars()
sns.histplot(comment_lengths, bins=50, ax=ax_com_len)
ax_com_len.set_xlabel("Comment Length in Symbols")
plt.show()

Most of the comments have less than a 1000 symbols and are capped ant 5000 symbols.

Filtering out comments with no letters:

Show the code
train_data = train_data.filter(train_data["comment_text"].str.contains("[a-zA-Z]"))

Number of tokens:

Show the code
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
comment_tokenized_lengths = train_data["comment_text"].map_elements(
    lambda x: len(tokenizer(x)["input_ids"])
)
clear_output()
fig_no_tokens, ax_no_tokens = plt.subplots(figsize=BASE_FIG_SIZE)
sns.boxplot(
    comment_tokenized_lengths.filter(
        train_data[:, -6:].sum_horizontal() == 0
    ).to_numpy(),
    log_scale=True,
    ax=ax_no_tokens,
)
sns.boxplot(
    y=comment_tokenized_lengths.filter(
        train_data[:, -6:].sum_horizontal() != 0
    ).to_numpy(),
    log_scale=True,
    ax=ax_no_tokens,
    x=1,
)
ax_no_tokens.set_xticks([0, 1])
ax_no_tokens.set_xticklabels(["Non-toxic", "toxic"])
ax_no_tokens.set_ylabel("Number of Tokens")
plt.show()

In order to see the number of tokens in each comment the comments were tokenized using the BERT uncased tokenizer. Toxic comments seem to have a lower median length than benign comments.

90th percentile of tokenized comment length:

Show the code
int(
    comment_tokenized_lengths.filter(train_data[:, -6:].sum_horizontal() != 0).quantile(
        0.90
    )
)
150

To get a good compromise between long term token relationship comprehension by the model and processing time. The max_length parameter of tokenizers used will be set to the 90th percentile of the lengths of toxic comments.

2.3 Language Detection

Using a language detection model to predict the language in each comment:

Show the code
def detect_language(text):
    try:
        result = detect_langs(text)[0].lang
    except:
        result = "empty"
    return result


languages = train_data["comment_text"].map_elements(detect_language)
joblib.dump(languages, "temp/languages.joblib")

Loading the predictions:

Show the code
languages = joblib.load("temp/languages.joblib")

Major non-english languages detected:

Show the code
train_data.with_columns(languages.alias("lang")).filter(pl.col("lang") != "en")[
    "lang"
].value_counts().sort("counts", descending=True).head()
shape: (5, 2)
lang counts
str u32
"de" 571
"fr" 373
"af" 344
"so" 275
"id" 269

Using a language detection model it is predicted that a few comments might not be in english and therefore would require different NLP models.

Inspecting some of the predicted non-english comments:

Show the code
aux.table_display(
    train_data.with_columns(languages.alias("lang"))
    .filter(pl.col("lang") != "en")[["comment_text", "lang"]]
    .head(20),
    tablefmt="html",
)
comment_text lang
REDIRECT Talk:Voydan Pop Georgiev- Chernodrinski da
REDIRECT Talk:Frank Herbert Mason de
Oh, it's me vandalising?xD See here. Greetings, af

Azari or Azerbaijani?

Azari-iranian,azerbaijani-turkic nation.
sl
86.29.244.57|86.29.244.57]] 04:21, 14 May 2007 tl
Future Perfect at Sunrise|☼]] 14:59, 16 ro
REDIRECT Talk:José Manuel Rojas et

Valerie Poxleitner

Valeri Poxleitner, A.K.A. Lights. If
de
|listas = Manos Family es
Barnes Aus 1 1 8 de
06:15, 19 Aug 2004 (UTC) de
P.S. Are you a /b/tard? ca
" No problem at all!  (talk) " no
I've just seen that nl

Batman

I am Batman. You are Spiderman. I win.
id
WikiDon, STOP stalking me! af
Type 3 looks gorgeous ) (talk) af
|listas = Schaefer, Nolan de

"::I LOL'd hardest at J.delanoy's. P Cobra

"
ca
REDIRECT Talk:Jeopardy! (video games) et

Upon further inspection of some of these comments it is evident that most of them are in fact in english and was flagged as not due to the model’s imperfections. Therefore no comments will be omitted.

3 Modeling

Class interaction schema:

The data preprocessing and model training process are established using several custom classes that inherit from PyTorch classes. The setup is illustrated in the schema above. The values of parameters highlighted in red are adjustable through a configuration file, which is employed to instantiate a dataloader, model, and trainer.

The parameters in the configuration file are as follows: - data: polars DataFrame or string directory - batch_size: Batch Size - model: Text name of the model to be used in the AutoModel and AutoTokenizer .from_pretrained() function - tokenizer_max_len: Maximum length in the tokenizer to which tokenized text is padded or truncated - class_weights: balanced or None. Balanced weighs each label differently based on its frequency, assigning higher weights to rare labels. - learning_r: Initial learning rate - stop_patience: Number of epochs to continue after the training delta limit is reached - stop_delta: Lowest difference between the current and last epoch validation loss, after which the training is stopped - unfreeze_delta: Lowest difference between the current and last epoch validation loss, after which all model layers are unfrozen - tuning_lr: Learning rate of the model’s backbone - max_epochs: The maximum number of epochs if not stopped earlier - dropout: If not None, a float fraction corresponding to the dropout rate of a dropout layer before the final classification layer - under_sample: Use only this fraction of benign comments in the training data - train_frac: Use only a fraction of the training data - val_frac: Use only a fraction of the validation data - test_frac: Use only a fraction of the test data - name: Experiment name

Load tensorboard:

Show the code
%load_ext tensorboard

3.1 Experiment 1: DistilBERT

The first experiment uses DistilBERT as a base model.

Set Experiment 1 config and train the model:

Show the code
config = {
    "data": train_data,
    "batch_size": 16,
    "model": "distilbert-base-uncased",
    "tokenizer_max_len": 150,
    "class_weights": "balanced",
    "learning_r": 1e-4,
    "stop_patience": 2,
    "stop_delta": 1e-5,
    "unfreeze_delta": 1e-4,
    "tuning_lr": 1e-5,
    "max_epochs": 100,
    "dropout": None,
    "under_sample": 0.1,
    "train_frac": 1,
    "val_frac": 0.3,
    "test_frac": 1,
    "name": "distilbert_bs16_lr5e-4_w_166maxl_undersampling0.1_testsmall",
}

data_loader, model, trainer = aux.set_model_from_config(config=config)

trainer.fit(model, datamodule=data_loader)

Evaluate the model:

Show the code
all_metrics = {}
test_loader = data_loader.test_dataloader()
preds_distilbert_01 = trainer.predict(model, test_loader)
true_labels_distilbert_01 = torch.Tensor(test_loader.dataset.labels.to_numpy())
preds_distilbert_01 = torch.cat([batch.sigmoid() for batch in preds], dim=0)
clear_output()
metrics_disbert_v01 = aux.evaluate_model(
    true_labels=true_labels_distilbert_01,
    predictions=preds_distilbert_01,
    labels=train_data.columns[-6:],
    print_metrics=True,
)
all_metrics["distilbert_01"] = metrics_disbert_v01

3.2 Experiment 2 RoBERTa

With the hopes that added complexity of the RoBERTa model improves the performance this model is used next with the same hyperparameters.

Set Experiment 2 config and train the model:

Show the code
config_roberta_01 = {
    "data": train_data,
    "batch_size": 16,
    "model": "roberta-base",
    "tokenizer_max_len": 150,
    "class_weights": "balanced",
    "learning_r": 1e-4,
    "stop_patience": 2,
    "stop_delta": 1e-5,
    "unfreeze_delta": 1e-4,
    "tuning_lr": 1e-5,
    "max_epochs": 100,
    "dropout": None,
    "under_sample": 0.1,
    "train_frac": 1,
    "val_frac": 0.3,
    "test_frac": 1,
    "name": "roberta_bs16_lr1e-4_w_166maxl_undersampling0.1",
}
data_loader_roberta, model_roberta, trainer_roberta = aux.set_model_from_config(
    config=config_roberta_01
)

trainer_roberta.fit(model_roberta, datamodule=data_loader_roberta)

Evaluate the model:

Show the code
test_loader_roberta = data_loader_roberta.test_dataloader()
preds_roberta = trainer_roberta.predict(model_roberta, test_loader_roberta)
preds_roberta = torch.cat([batch.sigmoid() for batch in preds_roberta], dim=0)
true_labels_roberta = torch.Tensor(test_loader_roberta.dataset.labels.to_numpy())
clear_output()
metrics_roberta = aux.evaluate_model(
    predictions=preds_roberta,
    true_labels=true_labels_roberta,
    labels=train_data.columns[-6:],
    print_metrics=True,
)
all_metrics["roberta_01"] = metrics_roberta

3.3 Experiment 3: DistilBERT with a lower learning rate and batch size

Using a more complex model did not improve the result and hindered the training time. Therefore the next experiment is done using DistilBERT, but this time with a lower learning rate and a lower batch size.

Set Experiment 3 config and train the model:

Show the code
config_distilbert_02 = {
    "data": train_data,
    "batch_size": 8,
    "model": "distilbert-base-uncased",
    "tokenizer_max_len": 150,
    "class_weights": "balanced",
    "learning_r": 5e-5,
    "stop_patience": 2,
    "stop_delta": 1e-5,
    "unfreeze_delta": 1e-4,
    "tuning_lr": 5e-6,
    "max_epochs": 100,
    "dropout": None,
    "under_sample": 0.1,
    "train_frac": 1,
    "val_frac": 0.3,
    "test_frac": 1,
    "name": "distilbert_bs8_lr5e-5_w_166maxl_undersampling0.1",
}
(
    data_loader_distilbert_02,
    model_distilbert_02,
    trainer_distilbert_02,
) = aux.set_model_from_config(config=config_distilbert_02)

trainer_distilbert_02.fit(model_distilbert_02, data_loader_distilbert_02)

Evaluate the model:

Show the code
test_loader_distilbert_02 = data_loader_distilbert_02.test_dataloader()
preds_distilbert_02 = trainer_distilbert_02.predict(
    model_distilbert_02, test_loader_distilbert_02
)
preds_distilbert_02 = torch.cat(
    [batch.sigmoid() for batch in preds_distilbert_02], dim=0
)
true_labels_distilbert_02 = torch.Tensor(
    test_loader_distilbert_02.dataset.labels.to_numpy()
)
clear_output()
metrics_distilbert_02 = aux.evaluate_model(
    predictions=preds_distilbert_02,
    true_labels=true_labels_distilbert_02,
    labels=train_data.columns[-6:],
    print_metrics=True,
)
all_metrics["distilbert_02"] = metrics_distilbert_02

3.4 Experiment 4: DistilBERT with an additional dropout layer

Lowering the learning rate and the batch size slightly improved the performance. Next an extra dropout layer with a rate of 0.25 is added before the final classification layer to regularize the model better.

Set Experiment 4 config and train the model:

Show the code
config_distilbert_03 = {
    "data": train_data,
    "batch_size": 8,
    "model": "distilbert-base-uncased",
    "tokenizer_max_len": 150,
    "class_weights": "balanced",
    "learning_r": 5e-5,
    "stop_patience": 2,
    "stop_delta": 1e-5,
    "unfreeze_delta": 1e-4,
    "tuning_lr": 5e-6,
    "max_epochs": 100,
    "dropout": 0.25,
    "under_sample": 0.1,
    "train_frac": 1,
    "val_frac": 0.3,
    "test_frac": 1,
    "name": "distilbert_bs8_lr5e-5_w_166maxl_undersampling0.1_dropout025",
}
(
    data_loader_distilbert_03,
    model_distilbert_03,
    trainer_distilbert_03,
) = aux.set_model_from_config(config=config_distilbert_03)
trainer_distilbert_03.fit(model_distilbert_03, data_loader_distilbert_03)

Evaluate the model:

Show the code
test_loader_distilbert_03 = data_loader_distilbert_03.test_dataloader()
preds_distilbert_03 = trainer_distilbert_03.predict(
    model_distilbert_03, test_loader_distilbert_03
)
preds_distilbert_03 = torch.cat(
    [batch.sigmoid() for batch in preds_distilbert_03], dim=0
)
true_labels_distilbert_03 = torch.Tensor(
    test_loader_distilbert_03.dataset.labels.to_numpy()
)
clear_output()
metrics_distilbert_03 = aux.evaluate_model(
    predictions=preds_distilbert_03,
    true_labels=true_labels_distilbert_03,
    labels=train_data.columns[-6:],
    print_metrics=True,
)
all_metrics["distilbert_03"] = metrics_distilbert_03

Adding an extra dropout layer did not improve the performance of the model but it did improve the training speed causing the model to converge in a lesser number of epochs.

Get the number of epochs for each model:

all_metrics[distilbert_01][Training Epochs] = trainer.callbacks[4].stopped_epoch - 2 all_metrics[roberta_01][Training Epochs] = ( trainer_roberta.callbacks[4].stopped_epoch - 2 ) all_metrics[distilbert_02][Training Epochs] = ( trainer_distilbert_02.callbacks[4].stopped_epoch - 2 ) all_metrics[distilbert_03][Training Epochs] = ( trainer_distilbert_03.callbacks[4].stopped_epoch - 2 )

Save the best model and metrics:

joblib.dump( (model_distilbert_02, data_loader_distilbert_02, trainer_distilbert_02), f”temp/models/distilbert_02.joblib”, )

joblib.dump(all_metrics, temp/all_metrics.joblib)

joblib.dump((true_labels_distilbert_02, preds_distilbert_02), temp/test_preds.joblib)

Load saved model and metrics:

Show the code
model_distilbert_02, data_loader_distilbert_02, trainer_distilbert_02 = joblib.load(
    "temp/models/distilbert_02.joblib"
)
all_metrics = joblib.load("temp/all_metrics.joblib")

true_labels_distilbert_02, preds_distilbert_02 = joblib.load("temp/test_preds.joblib")

4 Model Comparison

Due to label imbalance the models are compared using the average area under the precision recall curve for each label.

AUC-PR comparison between experiments:

Show the code
experiment_params = [
    "DistilBERT, Learning r = 5e-4, Batch size = 16,",
    "RoBERTa, Learning r = 5e-4, Batch size = 16,",
    "DistilBERT Learning r = 5e-5 Batch size = 8,",
    "DistilBERT Learning r = 5e-5 Batch size = 8, Dropout = 0.25,",
]
fig_auc_pr_compare, ax_aurc_pr_compare = plt.subplots(figsize=BASE_FIG_SIZE)
for i, metrics in enumerate(all_metrics.values()):
    sns.barplot(
        y=[i],
        x=[metrics["AUC-PR"]],
        orient="h",
        ax=ax_aurc_pr_compare,
        color=sns.color_palette()[0],
    )
    ax_aurc_pr_compare.annotate(
        f"{experiment_params[i]} Training Epochs: {metrics['Training Epochs']}",
        (0.631, i),
    )
ax_aurc_pr_compare.set_xlim(0.63, 0.695)
ax_aurc_pr_compare.set_yticks([])
ax_aurc_pr_compare.set_xlabel("Average Area Under the Precision Recall Curve")
plt.show()

Model from experiment 3 is chosen for further evaluation.

5 Model Evaluation

Plotting the classification metrics for each label with different decision thresholds and selecting optimal values:

Show the code
fig_therholds, ax_thresholds = plt.subplots(
    2, 3, figsize=(BASE_FIG_SIZE[0], BASE_FIG_SIZE[1] * 1.5)
)
ax_thresholds = ax_thresholds.flatten()
f1s_all_labels = []
best_thresholds = []
for i, ax in enumerate(ax_thresholds):
    predicted_probs = preds_distilbert_02[:, i]
    true_labels = true_labels_distilbert_02[:, i]
    label = train_data.columns[-6:][i]
    thresholds = np.linspace(0.0, 1, num=50)
    precisions, recalls, f1_scores = [], [], []
    for threshold in thresholds:
        binary_preds = predicted_probs >= threshold
        precisions.append(
            precision_score(true_labels, binary_preds, zero_division=np.nan)
        )
        recalls.append(recall_score(true_labels, binary_preds, zero_division=np.nan))
        f1_scores.append(f1_score(true_labels, binary_preds, zero_division=np.nan))
    max_f1 = np.nanmax(f1_scores)
    f1s_all_labels.append(max_f1)
    best_thresholds.append(thresholds[np.nanargmax(f1_scores)])
    ax.plot(thresholds, precisions, label="Precision")
    ax.plot(thresholds, recalls, label="Recall")
    ax.plot(thresholds, f1_scores, label="F1")
    ax.set_xlabel("Threshold")
    ax.set_ylabel("Score")
    ax.set_title(f"{label}")
    ax.legend(loc="lower right")
    ax.set_ylim((0, 1.18))
    ax.annotate(
        f"Support: {int(true_labels.sum())}\nMax F1-score: {max_f1:.2f}",
        (0, 1.02),
    )
plt.tight_layout()
plt.show()
preds_updated_therholds = preds_distilbert_02 > torch.Tensor(best_thresholds)

It is evident that the model highly favours recall over a large decision threshold range. Due to this fact to optimize for the f1-score (increasing precision with minimal losses to recall) thresholds for each class had to be increased.

Classification report with optimal thresholds:

Show the code
print(
    classification_report(
        true_labels_distilbert_02, preds_updated_therholds, zero_division=np.nan
    )
)
              precision    recall  f1-score   support

           0       0.84      0.77      0.80      3059
           1       0.42      0.69      0.52       319
           2       0.83      0.82      0.83      1690
           3       0.47      0.61      0.53        95
           4       0.71      0.82      0.76      1576
           5       0.52      0.61      0.56       281

   micro avg       0.75      0.78      0.77      7020
   macro avg       0.63      0.72      0.67      7020
weighted avg       0.77      0.78      0.77      7020
 samples avg       0.71      0.70      0.85      7020

Classification metrics for a binary problem (benign vs toxic):

Show the code
print(
    classification_report(
        (true_labels_distilbert_02.sum(dim=1) > 0),
        preds_updated_therholds.sum(dim=1) > 0,
    )
)
              precision    recall  f1-score   support

       False       0.97      0.99      0.98     28681
        True       0.86      0.77      0.81      3232

    accuracy                           0.96     31913
   macro avg       0.92      0.88      0.90     31913
weighted avg       0.96      0.96      0.96     31913

5.1 Model Interpretability

In order to gain insights on how the model makes it’s decisions LIME explanation package is used.

LIME explanations for several toxic comments:

Show the code
toxic_indices = (
    test_loader_distilbert_02.dataset.labels.with_columns(
        pl.Series(np.arange(len(test_loader_distilbert_02.dataset.labels))).alias(
            "index"
        )
    )
    .filter(test_loader_distilbert_02.dataset.labels.sum_horizontal() > 0)
    .sample(3, seed=1)["index"]
    .to_list()
)

for i in toxic_indices:
    text = test_loader_distilbert_02.dataset.texts[i]
    explainer = LimeTextExplainer(class_names=train_data.columns[-6:])
    explanation = explainer.explain_instance(
        text,
        lambda x: aux.predict_probabilities(x, model=model_distilbert_02),
        labels=np.arange(6),
    )
    print("True labels:")
    display(test_loader_distilbert_02.dataset.labels[i])
    print("Explanation:")
    explanation.show_in_notebook()
True labels:
shape: (1, 6)
toxic severe_toxic obscene threat insult identity_hate
i64 i64 i64 i64 i64 i64
1 0 1 0 1 0
Explanation:
True labels:
shape: (1, 6)
toxic severe_toxic obscene threat insult identity_hate
i64 i64 i64 i64 i64 i64
1 0 0 0 0 0
Explanation:
True labels:
shape: (1, 6)
toxic severe_toxic obscene threat insult identity_hate
i64 i64 i64 i64 i64 i64
1 0 1 0 1 0
Explanation:

The explainer highlights the most important words in each comment for a specific label. Yet it is far more interesting to see the interpretations in cases where mistakes were made.

Model explanation for mistaken benign comment:

Show the code
error_index = (
    test_loader_distilbert_02.dataset.labels.with_columns(
        pl.Series(np.arange(len(test_loader_distilbert_02.dataset.labels))).alias(
            "index"
        )
    )
    .filter(
        (test_loader_distilbert_02.dataset.labels.sum_horizontal() == 0)
        & pl.Series(preds_updated_therholds.sum(dim=1).numpy() > 0)
    )
    .sample(1, seed=1)["index"]
    .item()
)

text = test_loader_distilbert_02.dataset.texts[error_index]
explanation = explainer.explain_instance(
    text,
    lambda x: aux.predict_probabilities(x, model=model_distilbert_02),
    labels=np.arange(6),
)
print("True labels:")
display(test_loader_distilbert_02.dataset.labels[error_index])
print("Explanation:")
explanation.show_in_notebook()
True labels:
shape: (1, 6)
toxic severe_toxic obscene threat insult identity_hate
i64 i64 i64 i64 i64 i64
0 0 0 0 0 0
Explanation:

It is evident that certain keywords strongly override any value of context in the comment.

Model explanation for missed toxic comment:

Show the code
error_index2 = (
    test_loader_distilbert_02.dataset.labels.with_columns(
        pl.Series(np.arange(len(test_loader_distilbert_02.dataset.labels))).alias(
            "index"
        )
    )
    .filter(
        (test_loader_distilbert_02.dataset.labels.sum_horizontal() > 0)
        & pl.Series(preds_updated_therholds.sum(dim=1).numpy() == 0)
    )
    .sample(1, seed=2)["index"]
    .item()
)

text = test_loader_distilbert_02.dataset.texts[error_index2]
explanation = explainer.explain_instance(
    text,
    lambda x: aux.predict_probabilities(x, model=model_distilbert_02),
    labels=np.arange(6),
)
print("True labels:")
display(test_loader_distilbert_02.dataset.labels[error_index2])
print("Explanation:")
explanation.show_in_notebook()
True labels:
shape: (1, 6)
toxic severe_toxic obscene threat insult identity_hate
i64 i64 i64 i64 i64 i64
1 0 0 0 0 0
Explanation:

Miss classification of toxic comments seem to be less of a problem as the probabilities for toxicity were still high, thus a high recall can be achieved by lowering the threshold rate if needed.

5.2 Kaggle Score

The following cells were used to make a late submission to kaggle, it achieved a score of 0.979.

Loading Test Data:

Show the code
test_data = pl.read_csv("data/test.csv")
for i in range(6):
    test_data = test_data.with_columns(pl.zeros(len(test_data)).alias(str(i)))

Making predictions:

Show the code
test_dataset = CommentDataModule.CommentDataset(
    test_data, test_loader_distilbert_02.dataset.tokenizer, 166
)
testset_loader = torch.utils.data.DataLoader(test_dataset, batch_size=8, num_workers=0)
model_distilbert_02.eval()
test_predicts = trainer_distilbert_02.predict(model_distilbert_02, testset_loader)
test_predicts = torch.cat([batch.sigmoid() for batch in test_predicts], dim=0)
clear_output()

Exporting in submission format:

Show the code
submission = pl.concat(
    [
        pl.DataFrame(test_data["id"]),
        pl.DataFrame(test_predicts.numpy(), schema=toxicity_labels),
    ],
    how="horizontal",
)
submission.write_csv("temp/test_submission.csv")

6 Conclusions

  • The model can achieve comment toxicity labeling with a PR-ROC of 0.69.
  • The model prioritizes recall; achieving high precision is more challenging.
  • Mistakes made by the model on benign comments may stem from its difficulty in accurately evaluating context.

6.1 Further Improvements:

  • Increase the amount of training data, including a greater number of benign comments, to enhance context detection.
  • Experiment with various tokenizer max-length parameter values to optimize performance.
  • Explore different base models for the underlying architecture.
  • Enhance the training process by incorporating techniques such as layer unfreezing and learning rate scheduling.